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ABSTRACT 


The  solution  of  contemporary  large-scale  linear,  integer, 
and  mixed  integer  programming  problems  is  often  facilitated 
by  the  exploitation  of  intrinsic  special  structure  in  the 
model.   This  paper  deals  with  the  problem  of  identifying 
embedded  pure  network  rows  within  the  coefficient  matrix  of 
such  models  and  presents  two  heuristic  algorithms  for  iden- 
tifying such  structure.   The  problem  of  identifying  the  maximum 
size  embedded  pure  network  is  shown  to  be  among  the  class  of 
NP-hard  problems,  therefore,  the  polynomially-bounded 
algorithms  presented  here  do  not  guarantee  network  sets  of 
maximum  size.   However,  upper  bounds  on  the  size  of  the 
maximum  network  set  are  developed  and  used  to  evaluate  the 
algorithms.   Finally,  the  algorithms  were  tested  with  a 
number  of  large-scale,  real-world  models  and  the  results  of 
these  benchmark  runs  are  presented. 
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I.   INTRODUCTION 

The  success  of  mathematical  optimization  in  Operations 
Research  and  the  increase  in  size  and  speed  of  digital 
computers  have  led  to  the  formulation  of  very  large  and 
complex  systems  as  mathematical  programming  models.   The 
direct  solution  of  the  associated  linear  programming  (LP) 
problems  using  the  classical  simplex  method  is  often  pro- 
hibitively expensive,  if  not  impossible  in  a  practical  sense. 

The  large-scale  models  are  predominantly  characterized 
by  sparse  coefficient  matrices  and  inherent  special  structure. 
If  special  structure  can  be  identified,  it  can  often  be  used 
to  reduce  the  problem  to  a  more  manageable  size.   Methods  so 
used  to  exploit  special  structure  can  be  classified  as 
indirect  methods  or  direct  methods. 

Indirect  methods  use  special  structure  to  decompose  a 
large  problem  into  smaller  related  problems.   The  solution 
of  these  sub-problems  is  then  used  to  construct  a  solution 
to  the  larger  problem.   Indirect  methods  are  not  considered 
in  this  paper. 

Direct  methods  attempt  tc  solve  the  original  problem  more 
efficiently  by  using  specializations  of  the  simplex  procedure 
which  exploit  the  special  structure.   These  specializations 
often  replace  arithmetic  operations  with  logical  operations 
and  use  information  concerning  the  nature  of  the  eventual 


solution  provided  by  the  special  structure  to  reduce  compu- 
tational effort. 

This  paper  is  concerned  with  structures  which  can  be  used 
in  direct,  static  factorization  algorithms,  for  which  all 
simplex  bases  share  a  common  structure  under  row  partition. 
The  details  of  actual  exploitation  of  special  structure,  once 
identified,  will  not  be  discussed  here  (see  Graves  and  McBride 
[6]). 

Static  basis  factorizations  include  simple  upper  bounds, 
generalized  upper  bounds  (GUB) ,  and  embedded  network  rows, 
among  others.   Simple  upper  bound  rows  have  only  one  non-zero 
coefficient.   GUB  refers  to  a  set  of  rows  for  which  each 
column  (restricted  to  those  rows)  has  at  most  one  non-zero 
coefficient.   Embedded  network  rows  refers  to  a  set  of  rows 
for  which  each  column  (restricted  to  those  rows)  has  at  most 
two  non-zero  coefficients  of  opposite  sign.   If  the  non-zero 
coefficients  in  the  embedded  network  rows  are  restricted  to 
one  +1  and  one  -1  in  each  column,  then  the  structure  is 
referred  to  as  an  embedded  pure  network  (NET) . 

Various  transformations  are  available  to  identify  and 
exhibit  special  structure  in  the  coefficient  matrix.   These 
range  from  simple  permutation  of  rows  and  columns  to  full 
(linear)  transformations  of  the  coefficient  matrix.   An 
intermediate  method  allows  simple  scaling  (multiplication  by 
a  non-zero  constant)  of  each  row  and/or  column.   Generally, 
complete  transformation  methods  are  used  in  an  attempt  to 


convert  the  entire  coefficient  matrix  to  one  having  a  very 
special  structure,  such  as  a  node-arc  incidence  matrix  for 
a  network.   Partial  transformation  methods  look  for  large 
subsets  of  rows  in  the  coefficient  matrix  which  exhibit  the 
desired  structure,  with  the  implicit  presumption  that  large 
subsets  are  more  efficiently  exploited  than  small  subsets. 

As  mentioned  earlier,  much  of  the  computational  improve- 
ment of  the  specialized  simplex  algorithms  is  obtained  when 
logic  can  be  substituted  for  arithmetic  in  simplex  operations. 
This  is  most  conveniently  accomplished  when  the  coefficient 
values  in  the  rows  of  the  special  structure  set  are  restricted 
to  0,  ±1  .   This  restriction  can  be  satisfied  by  considering 
for  inclusion  in  the  structure  set  only  those  rows  with 
intrinsic  0,  ±1  entries.   In  practice,  however,  it  is  often 
possible,  through  row  and/or  column  scaling,  to  produce  the 
desired  0,  ±1  values.   For  simple  upper  bounds,  row  scaling 
will  suffice.   GUB  sets  can  be  converted  with  row  and  column 
scaling  (except  that  columns  corresponding  to  integer  varia- 
bles are  not  customarily  scaled) .   To  produce  pure  network 
rows,  however,  the  scaling  problem  is  non-trivial  due  to  the 
existence  of  two  non-zero  coefficients  in  many  columns  as 
well  as  the  requirement  that  unit  elements  in  the  same  column 
be  of  opposite  sign. 

The  use  of  generalized  upper  bounds  has  received  much 
attention  since  the  concept  was  introduced  in  1964  by  Dantzig 
and  Van  Slyke  [4] .   Some  form  of  GUB  has  been  implemented  in 


many  commercial  LP  systems,  though  restrictions  on  what 
constitutes  an  admissible  (i.e.,  implemented)  GUB  set  vary. 
Work  has  been  done  in  the  automatic  identification  of  GUB 
sets  [2],  [8];  computational  results  on  large-scale  problems 
indicate  that  this  is  not  only  feasible,  but  can  be  extremely 
advantageous  [3] ,  [13] . 

This  paper  extends  the  research  done  in  identifying  GUB 
sets  to  finding  sets  of  embedded  pure  network  rows .   In  pure 
network  rows,  coefficients  are  restricted  to  the  values  0,±1 
and  columns  with  two  non-zero  entries  must  have  exactly  one 
+1  entry  and  one  -1  entry.   Although  some  recent  work  has  been 
done  in  the  theory  of  complete  conversion  of  a  linear  program 
to  a  network  problem  (  [1] ,  [10]),  few  practical  results  have 
been  achieved  which  reliably  identify  a  subset  of  rows  which 
form  a  network  structure  if  complete  conversion  fails.   An 
efficient  algorithm  for  doing  so  is  of  great  value  since  the 
problem  usually  fails  to  be  completely  convertible,  and  since 
the  expense  of  attempting  complete  conversion  may  be 
prohibitive. 

The  problem  of  finding  a  maximum  GUB  set  (in  terms  of 
number  of  rows)  within  a  general  coefficient  matrix  has  been 
shown  by  Thomen  to  be  NP-hard  in  complexity  [13] .   This  paper 
proves  the  same  result  for  the  maximum  embedded  pure  network 
problem.   The  implication  is  that  currently  only  exponential 
time  algorithms  exist  to  solve  these  types  of  problems  and 
the  hope  of  finding  a  more  efficient  algorithm  is  dim. 


Therefore,  the  efficient  GUB-finding  methods  developed 
to  date  have  been  heuristic  algorithms.   That  is,  they  use 
methods  which  attempt  to  find  large,  sometimes  even  maximum 
GUB  sets,  but  which  cannot  guarantee  a  maximum  GUB  set.   Since 
the  size  of  the  maximum  GUB  set  is  not  known  for  the  large- 
scale  problems  with  which  we  work,  we  must  resort  to  compari- 
son with  upper  bounds  on  the  size  of  the  maximum  set  to 
evaluate  the  heuristics  [13] . 

The  methods  developed  in  this  paper  for  finding  pure 
network  row  sets  are  also  heuristic  algorithms.   Bounds  are 
developed  for  the  maximum  number  of  pure  network  rows,  in 
order  to  evaluate  the  NET  identification  algorithms. 

Computational  results  are  given  for  a  number  of  large- 
scale,  real-world  problems.   They  show  the  NET  identification 
algorithms  to  be  very  efficient  and  effective  in  identifying 
large  sets  of  pure  network  rows. 

Portions  of  this  research  have  been  published  in  abstract 
form  in  [14 ] . 
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II.   PROBLEM  DEFINITION  AND  REPRESENTATIONS 

A.   THE  LINEAR  PROGRAMMING  PROBLEM 

The  Linear  Programming  Problem  is  defined  here  as 


(L)         minimize   c  x 


s.t.     r   Ax  <_  r      (ranged  constraints) 
b  <_  x  <_  b      (simple  bounds) 

where  r  and  r  are  m-vectors,  x,  c,  b  and  b  are  n-vectors  and 
A  is  an  m  x  n  matrix.   Consider  for  the  moment  the  case  where 
all  x  variables  are  real-valued;  the  integer  and  mixed  integer 
cases  will  be  discussed  later. 

B.   THE  GUB  PROBLEM 

The  (maximum)  GUB  problem  for  (L)  can  be  stated  as: 

(GUB)    Find  a  (maximum)  subset  of  rows  in  A  which 
can  be  scaled  to  contain  only  0,+l  entries 
and  which  satisfy  the  property  that  each 
column  of  A  (restricted  to  those  rows)  has 
at  most  one  non-zero  entry. 

The  real  values  of  the  non-zero  coefficients  in  A  do  not 
make  a  difference  in  the  GUB  problem,  since  any  non-zero  entry 
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in  a  GUB  row  can  be  scaled  to  +1  by  column  scaling  alone. 
Therefore,  it  is  convenient  to  replace  A  by  a  binary  (0,1) 
matrix,  K,  of  the  same  dimension  where  each  non-zero  entry 
of  A  is  replaced  by  +1  with  all  other  entries  zero. 

Using  the  matrix  K,  with  entries  k. .  ,  the  (maximum)  GUB 
problem  can  be  formulated  as  the  binary  integer  program 


(GUBI)     (maximize)   z,  +  z2  + 


.  .  .  +  z 


m 


s.t.    T  k . . z .   <   1;     j=l,...,n 


where     z.  e  {0,1} 


The  variable  z.  is  an  indicator  variable  for  GUB  inclusion, 
i.e.,  if  z.  =  1,  row  i  is  included  in  the  GUB  set,  otherwise 
it  is  not. 

C.   REPRESENTATIONS  OF  THE  GUB  PROBLEM 

Alternative  representations  of  the  GUB  problem  have  been 
developed  as  the  basis  for  various  heuristic  algorithms  and 
for  theoretical  considerations  such  as  determining  the  com- 
plexity of  the  problem  and  developing  bounds  on  the  maximum 
achievable  size  of  a  GUB  set.   Some  of  these  representations, 
covered  in  detail  in  [13] ,  are  the  graphical  conflict  repre- 
sentation, the  conflict  matrix  representation,  and  the  vector 
space  representation. 
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The  concept  of  row  conflicts  is  basic  to  the  GUB  problem. 
Two  rows  in  A  are  said  to  conflict    if  there  is  at  least  one 
column  of  A  with  non-zero  entries  in  both  rows.   If  each  row 
of  A  is  considered  as  a  vertex  in  an  undirected  graph  with 
two  vertices  connected  by  an  edge  whenever  the  corresponding 
rows  conflict,  then  the  (maximum)  GUB  problem  becomes  one  of 
finding  a  (maximum)  independent  set  of  vertices  in  the  graph. 
An  independent  set  of  vertices  in  a  graph  is  a  subset  of  the 
total  vertex  set  with  no  two  vertices  adjacent  (connected  by 
an  edge)  in  the  graph. 

The  conflict  matrix  representation  of  the  GUB  problem 
uses  an  m  x  m  symmetric  binary  matrix  M  with  each  row  and 
column  representing  a  row  of  A.   M  has  +1  values  in  those  i,j 
entries  where  row  i  and  row  j  conflict  in  A.   By  definition, 
every  row  conflicts  with  itself  so  the  main  diagonal  of  M 
has  all  +1  entries.   The  (maximum)  GUB  problem  then  becomes 
one  of  finding  (through  permutation  of  the  rows  of  A)  an 
embedded  identity  matrix  (of  maximum  size)  in  the  conflict 
matrix  M. 

The  vector  space  representation  has  developed  from  a  paper 
by  Senju  and  Toyoda  [12]  dealing  with  the  heuristic  solution 
of  certain  0,1  programming  problems.   Each  row  of  K  is  con- 
sidered as  a  vector  in  n-space  having  unit  length  in  those 
directions  corresponding  to  its  non-zero  entries.   The  vector 
R  is  formed  as  the  sum  of  each  of  the  row  vectors.   A  unit 
hypercube  in  n-space  situated  at  the  origin  with  length  1  in 
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all  positive  directions  represents  the  feasible  GUB  region. 
If  R  extends  beyond  this  region,  the  set  of  rows  is  not  a 
GUB  set  and  at  least  one  row  must  be  removed  to  bring  R  into 
the  feasible  region.   The  (maximum)  GUB  problem  becomes  one 
of  determining  (the  minimum  number  of)  rows  which  must  be 
removed  in  order  to  bring  R  into  the  feasible  region.   The 
heuristics  based  on  this  representation  compute  gradient  vec- 
tors which  indicate  the  direction  of  shortest  distance  to  the 
feasible  region  and  remove  first  those  rows  which  produce 
the  greatest  movement  in  that  direction.   These  heuristics 
have  proven  to  be  the  fastest  to  date  and  produce  GUB  sets 
comparable  in  quality  to  other  heuristics  [13] . 

D.   THE  NET  PROBLEM 

The  (maximum)  Embedded  Pure  Network  problem  for  (L)  can 
be  stated  as: 

(NET)   Find  a  (maximum)  subset  of  rows  in  A  which  can 
be  scaled  to  contain  only  0,±1  entries  and 
which  satisfy  the  property  that  each  column 
of  A  (restricted  to  those  rows)  has  at  most 
two  non-zero  entries,  and  if  the  column  has  two 
non-zero  entries,  the  (scaled)  entries  must  be 
of  opposite  sign. 

The  real  values  of  the  non-zero  coefficients  in  A  cannot  be 
ignored  as  they  were  in  the  GUB  problem  since  simple  column 
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scaling  is  no  longer  sufficient  to  produce  the  required  ±1 
entries  in  columns  containing  two  non-zero  entries.   The 
addition  of  row  scaling  may  help,  but  even  this  is  not  suffi- 
cient to  guarantee  that  a  network  set  of  rows  obtained  by 
considering  only  the  signs  of  the  non-zero  elements  can  be 
scaled  to  the  required  0,±1  values. 

Considering,  for  the  moment  only,  matrices  with  0,±1 
entries  (or  a  subset  of  m  rows  with  0,±1  entries  in  a  general 
matrix)  with  no  scaling  allowed,  the  (maximum)  NET  problem  can 
be  formulated  as  the  binary  integer  program: 


(NETI)         maximize    z,  +  z-  +  ...  +  z 

12  m 


s.t.      I  z.      1;     j=l,...,n 

i:a. .=-1   1 


i:a. .=+1 


Z.   <_   1;     j  =  1,  .  .  .  ,n 


where  z.   e   {0,1} 


Again,  z.  is  an  indicator  variable  for  inclusion  in  the 
network  set. 

E.   REPRESENTATIONS  OF  THE  NET  PROBLEM 

Unfortunately,  NET  does  not  lend  itself  to  the  many  repre- 
sentations which  GUB  admits.   The  primary  reason  for  this  is 
that  the  scaling  problems  associated  with  NET  make  it  impossi- 
ble to  disregard  the  real  values  of  the  non-zero  coefficients 
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in  A.   Also,  the  concept  of  pairwise  row  conflicts  so  useful 
in  the  GUB  algorithms  does  not  apply  to  network  rows  when  row 
scaling  is  allowed. 

To  efficiently  confront  the  scaling  dilemma,  we  are 
currently  forced  to  restrict  the  eligibility  of  rows  for 
membership  in  the  network  set.   The  most  obvious  restriction 
is  to  allow  no  scaling  and  consider  only  those  rows  with 
intrinsic  0,±1  entries.   This  may  be  unnecessarily  prohibi- 
tive.  Two  less  restrictive  options  are  employed  in  the 
algorithms  described  later.   These  are: 

1.  Admit  only  rows  with  intrinsic  0,±1  entries  but 
allow  row  reflection    (multiplication  of  a  row  by  -1) . 

2.  Admit  only  rows  whose  non-zero  entries  can  be  row 
scaled  to  0,±1.   This  includes  rows  with  all  non-zero 
entries  of  the  same  absolute  value.   Row  reflection 
is  also  allowed. 

For  the  algorithms  presented  in  this  paper,  two  repre- 
sentations of  the  NET  problem  are  developed.   The  double-GUB 
representation  is  used  in  the  identification  of  networks  with 
a  bipartite  structure  by  considering  the  network  set  as  con- 
sisting of  two  disjoint  GUB  sets.   The  vector  space  repre- 
sentation is  used  in  the  identification  of  general  network 
structure. 

1.   Double-GUB  Set  Representation 

As  suggested  by  Thomen  [13],  GUB  heuristics  can  be 
used  to  produce  a  bipartite  network  row  factorization.   This 
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type  of  factorization  has  the  property  that  the  rows  in  the 
network  set  can  be  partitioned  into  two  subsets,  N,  and  N^, 
such  that  each  column  of  the  matrix  has  at  most  one  non- 
zero entry  in  N.  and  at  most  one  non-zero  entry  in  N~. 
Additionally,  the  entries  must  be  of  opposite  sign.   To 
produce  such  a  factorization,  a  GUB  heuristic  can  be  applied 
to  the  eligible  rows  of  A  producing  one  GUB  set,  and  then 
to  the  remaining  eligible  rows  not  selected  in  the  first 
pass,  producing  a  second  GUB  set.   This  can  be  done  if  inclu- 
sion in  the  second  GUB  set  is  conditioned  on  agreement  with 
rows  in  the  first  GUB  set,  allowing  row  reflection  if  necessary 
This  algorithm  (entitled  D-GUB)  was  implemented  using  the 
fastest  of  the  GUB  heuristics. 

2 .   Vector  Space  Representation 

If  we  consider  only  the  rows  of  A  with  0,±1  entries, 
or  those  which  have  been  scaled  to  0,±1,  a  vector  space 
representation  for  NET  can  be  developed  similar  to  that 
developed  for  GUB.   The  representation  can  also  allow  reflec- 
tion of  rows,  if  desired. 

With  each  row  in  the  eligible  set,  we  associate  two 

+      —    +  . 

vectors  in  n-space,  V.  and  V..   V.  is  the  vector  consisting 

of  1  in  those  dimensions  corresponding  to  +1  entries  in  row  i 

and  zero  in  all  other  dimensions.   V.  is  the  vector  consisting 

of  -1  in  those  dimensions  corresponding  to  -1  entries  in  row 

i  and  zero  in  all  others.   For  example,  if  row  i  is  (1,0,-1, 

1,0,-1,0),  then  V+  =  (1,0,0,1,0,0,0)  and  V~  =  (0,0,-1,0,0,-1,0) 
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We  define  R  as  the  resultant  vector  from  the  sum  of 

all  V.  and  R  as  the  resultant  vector  from  the  sum  of  all 

1 

V. .   These  vectors  extend  from  the  origin  into  the  orthants 
of  n-space  corresponding  to  all  positive  dimensions  and  all 
negative  dimensions,  respectively.   Two  hypercubes  in  this 
n-space,  one  situated  at  the  origin  with  length  1  in  all 
positive  directions  and  another  situated  at  the  origin  with 
length  1  in  all  negative  directions,  constitute  the  feasible 
NET  region.   Should  either  R  or  R   extend  beyond  its  feasi- 
ble region  then  the  rows  in  the  eligible  set  do  not  currently 
form  an  admissible  set  of  network  rows. 

The  reflection  (multiplication  by  -1)  of  a  row  merely 
results  in  the  switching  of  the  V  and  V   vectors  for  the 
row.   That  is,  when  row  i  is  reflected,  the  negative  of  V. 

becomes  V.  and  the  negative  of  V.  becomes  V..   This  in  turn 
l  3  i  l 

will  change  the  vectors  R  and  R  .   In  fact,  it  is  possible 
in  some  cases  that  just  the  reflection  of  rows  in  an  infeasible 
set  may  bring  R   and  R   into  their  feasible  regions  without 
deletion  of  any  rows. 

This  representation  is  used  as  the  basis  for  the 
network  identification  heuristic  for  (NET)  developed  by  the 
author  and  presented  herein.   If  either  R  or  R  extends 
beyond  the  feasible  region,  a  row  penalty  for  each  row  is 
computed  as  the  dot  product  of  V .  and  R  plus  the  dot  product 
of  V.  and  R~.   The  row  with  the  greatest  row  penalty  is 
identified  and  the  revised  penalty  for  that  row,  if  reflected, 
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is  computed.   If  this  reflected  penalty  is  less  than  the 
original  row  penalty,  the  row  is  reflected,  otherwise  it  is 
deleted.   When  both  R  and  R   fall  within  the  feasible  region, 
the  set  of  rows  which  remain  constitutes  an  admissible  net- 
work set. 
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III.   IMPLEMENTATION  OF  AUTOMATIC  NETWORK 
IDENTIFICATION  HEURISTICS 

A.   DOUBLE-GUB  (D-GUB)  ALGORITHM 

The  D-GUB  Algorithm  is  based  on  the  double-GUB  represen- 
tation of  the  network  problem  discussed  in  the  previous 
section.   The  automatic  GUB  algorithm  used  to  find  the  GUB 
sets  is  a  gradient  method  based  on  a  heuristic  for  approxi- 
mate solution  of  certain  0,1  variable  linear  programming 
problems  developed  by  Senju  and  Toyoda  [12]  .   It  was  coded 
by  Thomen  and  proved  to  be  the  most  efficient  of  the  GUB 
identification  algorithms  tested  in  his  research.   A  brief 
description  of  the  GUB  algorithm  will  be  given  here;  it  is 
fully  described  in  [13] . 

The  GUB  heuristic  is  a  two-phase,  one-pass,  non-back- 
tracking algorithm  which  is  feasibility  seeking.   The  initial 
eligible  set  consists  of  all  rows  which  can  be  scaled  to  the 
required  values  according  to  the  scaling  scheme  employed. 
Phase  1  attempts  to  delete  as  few  rows  as  possible  in  order 
to  produce  a  feasible  GUB  set.   Phase  2  examines  the  rows 
deleted  in  Phase  1  and  reincludes  rows  which  do  not  violate 
the  GUB  restriction. 

The  D-GUB  algorithm  uses  a  sequential  application  of  the 
GUB  heuristic  to  the  set  of  network  eligible  rows.   Eligi- 
bility for  inclusion  in  the  first  GUB  set  is  determined  by 
one  of  the  two  scaling  restrictions  described  in  the  previous 
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section.   In  the  first  pass,  the  initial  GUB  set  is  deter- 
mined.  Eligible  rows  not  included  in  the  first  GUB  set  are 
then  examined  to  determine  (under  the  requirement  that  two 
entries  in  a  network  column  be  of  opposite  sign)  if  the  row 
is  eligible  for  inclusion  in  the  second  GUB  set  in  either 
present  or  reflected  form.   If  eligible  rows  are  found,  a 
second  pass  is  made  with  the  GUB  algorithm  producing  a  second 
GUB  set. 

The  D-GUB  Algorithm: 

Step  0.  Determine    Eligible   Rows,      Using  the  scaling  scheme 
desired,  determine  which  rows  of  the  matrix  are  eligible 
for  selection  as  network  rows. 

Step  1.  Find   First    GUB   Set.      Apply  the  GUB  heuristic  to  the 
eligible  set  updating  the  gradient  vector  following 
each  row  deletion. 

Step  2.      Determine    Eligibility    for   Second   GUB   Set.      For  each 
eligible  row  not  included  in  the  first  GUB  set,  check  the 
columns  in  which  the  row  has  non-zero  entries.   In  each 
of  these  columns,  if  the  first  GUB  set  has  no  non-zero 
entries  or  one  non-zero  entry  of  opposite  sign  then  the 
row  is  eligible  for  inclusion  in  the  second  GUB  set  in 
its  present  form.   If  the  first  GUB  set  has  no  non-zero 
entries  or  a  non-zero  entry  of  like  sign  in  each  column, 
then  the  row  is  eligible  for  inclusion  in  reflected  form. 
Otherwise,  the  row  is  not  eligible  and  is  discarded. 
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Step  3.  Find   Second   GUB   Set.      If  there  are  any  rows  eligible 
for  the  second  pass,  reapply  the  GUB  heuristic  to  those 
rows . 
Computational  experience  on  a  number  of  real-world  models 

indicates  that  Phase  2  of  the  GUB  heuristic  rarely  adds 

additional  rows  to  the  GUB  sets  obtained  in  either  pass . 

For  the  second  GUB  set,  Phase  2  was  especially  ineffectual. 

This  suggests  that  the  algorithm,  which  is  already  extremely 

fast,  can  be  made  even  faster  by  the  elimination  of  Phase  2 

with  minimal  loss  of  solution  quality. 

B.   NETWORK  IDENTIFICATION  (NET)  ALGORITHM 

The  NET  Algorithm  is  based  on  the  vector  space  represen- 
tation of  the  network  problem  discussed  in  Section  II.   The 
algorithm  is  two-phased,  one-pass,  and  non-backtracking.   It 
is  a  deletion  heuristic  which  is  feasibility  seeking.   As 
such,  it  begins  with  an  eligible  set  of  rows  which  normally 
do  not  form  an  admissible  network  set  and  attempts  to  delete 
as  few  rows  as  possible  to  obtain  a  feasible  set.   Deleted 
rows  are  then  considered  for  reinclusion  if  they  do  not  violate 
the  feasibility  requirement. 

The  measure  of  inf easibility  at  any  point  is  a  matrix 
penalty  computed  as  the  sum  of  individual  row  penalties.   Rows 
in  the  eligible  set  are  examined  in  order  of  decreasing  row 
penalty  and  either  reflected,  if  the  row  penalty  would  be 
reduced,  or  removed  and  placed  in  a  candidate  set  for  later 
use.   This  guarantees  that  the  matrix  penalty  will  be  reduced 
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at  each  iteration.   Thus,  the  number  of  iterations  in  Phase 
1  is  bounded  by  the  initial  matrix  penalty,  which  is  poly- 
nomially  bounded.   In  Phase  2,    the  rows  in  the  candidate  set 
are  examined  for  reinclusion  in  the  eligible  set  if  they  do 
not  increase  the  matrix  penalty.   Those  not  reincluded  are 
discarded. 

Statement  of  the  Problem: 

Let  A  =  {a .  . }  be  an  m  x n  matrix  with  a..  =  0 , ±1 V  i , j . 
13  13  'J 

Problem:   Find  a  matrix  N  =  {n. .}  with  (m-k)  rows  and  n 
columns  which  is  derived  from  A  by 

1.  Deleting  k  rows  of  A  where  k  >_  0 , 

2.  Multiplying  zero  or  more  rows  of  A  by  -1, 
where  N  has  the  property  that  each  column  of  N  has 
at  most  one  +1  element  and  at  most  one  -1  element. 
We  wish  to  find  a  large  N  in  the  sense  of  containing 
as  many  rows  as  possible,  i.e.,  minimize  k. 

Terminology  and  Notation: 

1.  E  is  the  set  of  row  indices  for  rows  eligible  for  inclusion 
in  N  and  is  called  the  eligible  set. 

2.  C  is  the  set  of  row  indices  for  rows  removed  from  E  in 
Phase  I  (Deletion) .   Some  rows  in  C  may  be  readmitted  to 
E  in  Phase  II.   C  is  called  the  candidate  set. 

3.  The  phrase  "reflect  row  i'  of  A"  means  to  multiply  each 
element  in  row  i'  by  -1,  i.e.,  a...  ■*■  -a.  ,  .  V  j. 

4.  Other  notation  will  be  defined  in  the  algorithm  itself. 
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The  NET  Algorithm: 

Phase  I  -  deletion    of  Infeasible   Rows 

Step  0:  Initialization.       Set  E  =  { 1,  2  ,  .  .  .  ,m}  ,  C  =  <J> . 

For  each  column  j  of  A  compute  the  +  penalty  (K.)  and 

the  -  penalty  (K.)  as  follows: 

K+      =       (      I         1)-1,     KT   =   (       1   )  -  1  . 

J       ieE:a.  . >0  J       ieE:a.  .  <0 

ID  ID 

These  penalties  represent  the  number  of  excess  +1  and  -1 
elements,  respectively,  in  column  j  which  prevent  the 
rows  whose  indices  remain  in  E  from  forming  a  valid  N 
matrix.   A  penalty  value  of  -1  for  K.(K.)  indicates  that 
the  column  does  not  contain  a  +1(-1)  element. 

Step  1:  Define   How   Penalties .   For  every  i  e  E,  compute  a 
row  penalty  (p.)  as  follows: 


Pi   =     I  K+   +     I  K" 

j :a . . >0    J      j :a . . <0    J 

J   13  J   13 


This  is  simply  the  sum  of  +  penalties  for  all  columns  in 
which  row  i  has  a  +1  plus  the  sum  of  -  penalties  for 
all  columns  in  which  row  i  has  a  -1. 

Step  2;  Define   Matrix  Penalty.      Compute  the  penalty  (h) 

for  the  matrix  by  summing  the  row  penalties  as  follows: 
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h  =   I  p.  . 
ieE 


If  h  =  0,  then  go  to  Step  7.   Otherwise,  go  to  Step  3. 

Step  3:  Row   Selection .      Find  the  row  i1  e  E  with  the 
greatest  penalty,  i.e., 

Find  i'  e  E  such  that  p. ,  =  max  p. 

ieE 

(If  there  is  a  tie,  choose  i'  from  among  the  tied  values 
Compute  the  reflected  row  penalty  p.,  for  i'  as  follows: 

p    =     I  (kT+1)   +     I  (K++1)  . 

j:ai,j>0   J         j:aitj<0   -> 

This  would  be  the  row  penalty  for  row  i1  if  it  were  to 
be  reflected. 

Step  4:  delete,    or   Reflect   Row. 

Case   i )   p  .  ,  >_  p  .  .   Let  E  +■  E  -  { i  '  }  ,  C  «-  C  U  { i  '  }  .   Go 

to  Step  5 . 
Case  ii)   p.,  <  p.,.   Reflect  row  i'.   Go  to  Step  6. 

Step  5:  Reduce   column  penalties    as  follows: 

For  all  j  such  that  a...  >  0,  K.  «-  K .  -  1 
J  I'D       3  3 

For  all  j  such  that  a...  <  0,  K.  -*-  K .  - 1 
J  i'3       3  3 

Go  to  Step  1. 
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Step  6:  Change    column   penalties    as  follows: 

Using  the  a. , .  values  after  reflection  of  row  i*, 

For  all  j  such  that  a .  .  .  >  0 ,  K+  +   K+  +  1   and   K~  ■*■   kT  -  1 

i  3  3  3  3  3 

For  all  j  such  that  a.  ,  .  <  0,  K+  «-  K+  -  1   and   kT  *■   K~  +  1 

i  3  3  3  3  3 

Go  to  Step  1. 

Phase  II  -  Reinolusion   of  Rows   from    C 

Step  7.  Eliminate    Conflicting   Rows.      The  rows  with  indices 
in  E,  some  possibly  reflected  from  the  original  A  matrix, 
form  a  valid  N  matrix.   However,  some  of  the  rows  removed 
from  E  and  placed  in  C  may  now  be  reincluded  in  E  if  they 
do  not  make  h  >  0.   Remove  from  C  (and  discard)  all  row 
indices  for  rows  which,  if  reincluded  in  E  in  present  or 
reflected  form,  would  make  h  >  0.   I.e.,  Remove  i  from 
C  if 

a)  3      j,  such  that  a. .   >  0  and  K.   =0 

1  ^1  =>1 

or  a .    <  0  and  K .   =0 
^1  3l 

and 

b)  3     j0  such  that  a. .   >  0  and  K.   =0 

2  ^2  32 

or  a .  .   <  0  and  K .   =0 

m2       32 


If  C  =  <j>,   STOP,  otherwise  go  to  Step  8. 
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Step  8.  Select   Row    for   Reinalusion .      At  this  point  a  row 

from  C  may  be  reincluded  in  E.   There  are  several  possible 
schemes  for  selecting  the  row.   After  the  row  is  rein- 
eluded,  the  column  penalties  are  adjusted.   Then  go  to 
Step  7. 

No  dominating  rule  has  been  discovered  for  breaking  ties 
in  maximum  row  penalty  encountered  in  Step  3.   The  rule  used 
for  the  computational  results  presented  herein  is  to  select 
the  row  with  the  minimum  number  of  non-zero  entries  in  an 
attempt  to  place  a  larger  number  of  non-zero  entries  in  the 
network  set.   Other  possible  rules  are  "first-come,  first- 
served",  maximum  number  of  non-zero  entries,  type  of  con- 
straint, or  modeler  preference. 

Although  the  algorithm  described  above  is  presented  for 
a  matrix  with  strictly  0,±1  entries,  it  can  be  generalized 
to  any  matrix  by  simply  letting  E  be  the  set  of  rows  with 
strictly  0,±1  entries  or  which  can  be  scaled  to  contain  only 
0,±1  entries. 

Prespecified  network  rows  can  also  be  accommodated  with 
the  following  modifications: 

Let  P  =  {i|row  i  is  prespecified}. 

Then  E  •*•   E  -  P. 

After  computation  of  K.  and  K~  in  Step  0,  for  each  column 


if  3i  e  P  such  that  a.  .  =  1   then   K+  «-  K+  +  1, 

ill  3     3 
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if  3  i  e  P  such  that  a.  .  =  -1   then  kT  «-  K~  +  1 . 

ID  3  3 

Rows  in  P  are  not  eligible  for  deletion  or  reflection. 
At  the  termination  of  the  algorithm,  the  rows  in  N  are 
given  by  EUP. 

Computational  experience  on  real-world  models  indicates 
that  Phase  2  of  the  NET  algorithm  is  even  less  productive 
than  that  of  the  GUB  algorithm.   In  only  two  of  sixteen  cases 
were  any  rows  eligible  for  reinclusion  and  the  maximum  number 
eligible  was  three.   This  indicates  that  the  expense  of 
examining  the  rows  in  the  candidate  set  for  eligibility  is 
probably  not  justified  for  the  occasional  small  improvement 
in  quality. 
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IV.   PROBLEM  COMPLEXITY 

Analysis  of  the  inherent  complexity  of  a  problem  can 
reveal  whether  there  is  a  possibility  of  developing  an 
efficient  algorithm  to  completely  solve  all  cases  of  the 
problem.   Unfortunately,  analysis  of  the  NET  problem  indi- 
cates that  it  cannot  be  solved  optimally  by  an  efficient 
algorithm  at  this  time. 

The  complexity  of  a  problem  is  said  to  be  polynomial    if 
an  algorithm  exists  which  can  solve  the  problem  in  a  number 
of  fundamental  operations  bounded  by  a  polynomial  function  of 
intrinsic  problem  dimensions.   Such  an  algorithm  is  called  a 
polynomial  time  algorithm.   An  algorithm  which  is  not  bounded 
by  such  a  polynomial  function  is  said  to  be  an  exponential 
time  algorithm.   The  maximum  solution  time  for  exponential 
time  algorithms  grows  explosively  as  problems  dimensions 
increase. 

The  class  of  intractable    problems  consists  of  those 
problems  for  which  no  polynomial  time  algorithms  exist. 
Between  the  class  of  polynomial  problems  and  the  class  of 
intractable  problems,  there  is  the  class  of  non-deterministic 
■polynomial   problems — class  NP .   This  class  consists  of  those 
problems  for  which  a  guessed  solution  can  be  verified  in  poly- 
nomial time,  but  for  which  the  existence  of  a  polynomial 
time  algorithm  to  solve  such  problems  has  not  yet  been  proved 
or  disproved. 
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A  problem  b  is  said  to  be  reducible  to  a  problem  c  if  a 
polynomial  time  algorithm  for  solving  c  (if  one  exists)  can 
be  used  to  produce,  in  polynomial  time,  a  polynomial  time 
algorithm  for  solving  b.   If  every  problem  of  the  class  NP 
is  reducible  to  the  problem  c,  then  c  is  said  to  be  NP-hard. 
In  addition,  if  c  also  belongs  to  class  NP,  then  c  is  NP- 
oomplete .      This  means  that  should  a  polynomial  time  algorithm 
be  found  which  solves  an  NP-complete  problem,  then  polynomial 
time  algorithms  exist  to  solve  all  problems  in  class  NP .   On 
the  other  hand,  should  an  NP-complete  problem  be  proved 
intractable,  then  all  problems  in  class  NP  are  intractable. 
The  question  is  still  open  at  this  time.   For  a  more  complete 
treatment  of  complexity  theory,  see  [5],  [9]. 

The  problem  of  finding  a  GUB  set  of  specified  size  (i.e., 
number  of  rows)  was  shown  by  Thomen  [13]  to  be  NP-complete, 
while  that  of  finding  a  maximum  GUB  set  was  shown  to  be  NP- 
hard.   As  an  observation,  the  corresponding  maximum  D-GUB 
problem  with  no  scaling,  since  it  represents  a  composition 
of  two  disjoint  GUB  problems,  is  also  NP-complete  (for  a  D-GUB 
set  of  specified  size)  and  NP-hard  (for  a  maximum  D-GUB  set) . 

The  problem  of  complete  conversion  of  an  arbitrary  matrix 
to  the  node-arc  incidence  matrix  of  a  pure  network,  using 
linear  transformation  of  the  matrix  as  well  as  row  and  column 
scaling,  has  been  shown  to  be  polynomial  in  complexity  [1] , 
[10] .   This,  however,  does  not  apply  to  the  problem  of  finding 
the  maximum  embedded  pure  network  should  complete  conversion 
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fail.   Consider  the  NET  problem  for  a  unit  matrix,  that  is, 
a  matrix  containing  only  0,±1  entries.   For  the  present,  no 
scaling  is  allowed.   A  slightly  less  complex  problem  than 
finding  the  maximum  size  embedded  network  set  is  the  following 
problem: 

(NETD)   Given  an  m  *  n  unit  matrix  A  and  an  integer  p  <  m, 
determine  whether  A  contains  a  set  of  p  or  more 
rows  such  that  each  column  of  A  (restricted  to 
those  rows)  has  at  most  two  non-zero  entries, 
where  entries  in  the  same  column  must  be  of  oppo- 
site sign. 

Given  a  set  of  p   rows  from  A,  it  is  easy  to  verify,  in 
polynomial  time,  whether  the  set  satisfies  the  above  cri- 
terion.  Given  an  integer  p  <  m,  it  is  not  easy  to  determine 
whether  there  exists  a  set  of  p  or  more  rows  in  A  which 
satisfies  the  criterion.   In  general,  there  does  not  currently 
exist  an  algorithm  which  can  do  so  in  polynomial  time,  as 
will  now  be  shown. 

In  a  unit  matrix,  two  rows  may  be  said  to  conflict  if  they 
both  contain  a  non-zero  element  of  like  sign  in  a  common 
column.   The  absence  of  such  pairwise  conflicts  in  a  subset 
of  rows  from  A  is  not  a  necessary  condition  for  the  rows  to 
form  a  valid  network  set  if  row  reflection  is  allowed.   How- 
ever, that  condition  is  necessary  and  sufficient  for  that 
purpose  when  no  scaling  is  allowed.   With  no  scaling,  it  is 
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evident  that  the  absence  of  pairwise  conflicts  is  necessary 
in  a  valid  network  set,  for  the  existence  of  a  conflict  vio- 
lates the  opposite  sign  requirement  for  columns  containing 
two  non-zero  elements.   It  is  also  sufficient,  because  the 
violation  of  the  criterion  for  a  valid  network  set  would  re- 
quire at  least  one  column  of  A  to  contain  at  least  two  non- 
zero entries  of  like  sign  in  rows  of  the  set.   This,  in  turn, 
would  imply  that  the  two  rows  in  which  this  occurs  are  in 
conflict. 

Following  closely  the  arguments  presented  by  Thomen  [13] 
for  the  GUB  problem,  a  graph  is  defined  in  which  the  nodes 
represent  the  rows  of  A  and  two  nodes  are  connected  by  an 
edge  if  and  only  if  the  rows  conflict  in  A.   The  problem  of 
finding  a  set  of  p  or  more  rows  in  A  which  do  not  conflict  is 
then  equivalent  to  finding  an  independent  set  of  size  p  or 
more  in  the  graph  so  defined.   This  problem,  known  as  the 
independent  set  decision  problem,  is  known  to  be  NP-complete 
[5] .   Furthermore,  the  problem  of  finding  a  maximum  independent 
set,  and  therefore,  a  maximum  GUB  set  or  network  set,  is  NP- 
hard. 

The  addition  of  row  reflection  to  the  problem  simply  means 
that  each  row  can  exist  in  one  of  two  states,  namely,  unreflected 
or  reflected.   Clearly  then,  in  a  set  of  m  rows,  there  are  2 
distinct  states  for  the  set,  each  corresponding  to  a  different 
subset  of  reflected  rows.   The  problem  of  finding  a  maximum 
network  set  in  A,  allowing  row  reflection,  is  equivalent  to 
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finding  a  maximum  network  set  with  no  scaling  allowed  (shown 
above  to  be  NP-hard)  for  2  distinct  matrices.  As  a  result, 
this  problem  is  also  NP-hard. 

Given  a  general  matrix  in  which  non-zero  entries  may  be 
of  any  magnitude,  and  allowing  simple  row  and  column  scaling, 
the  problem  of  finding  a  maximum  subset  of  rows  which  can  be 
scaled  to  produce  a  pure  network  set  can  be  approached 
(conceptually)  in  at  least  two  ways.   One  approach  is  to 
ignore  the  magnitude  of  non-zero  entries  and  consider  only 
their  sign.   When  a  maximum  size  network  set  is  found  for  the 
resulting  unit  matrix  (an  NP-hard  problem) ,  an  attempt  can 
be  made  to  scale  the  entries  in  the  rows  of  the  network  set  to 
the  required  0,±1  values.   The  scaling,  which  may  or  may  not 
be  successful,  can  be  done  in  polynomial  time,  however,  the 
entire  problem  remains  NP-hard.   Another  approach  is  to 
consider  only  subsets  of  rows  which  can  be  scaled  to  the 
required  unit  values  should  the  subset  be  found  to  be  a  valid 
network  set.   In  essence,  the  scaling  restrictions  applied  in 
the  algorithms  described  in  this  paper  guarantee  that  any 
subset  will  have  the  required  unit  values.   However,  if  this 
approach  excludes  any  subsets  which  may  be  scalable  to  the 
required  values,  the  maximum  size  set  may  be  missed.   Appar- 
ently, even  with  the  addition  of  scaling  considerations,  the 
basic  problem  of  finding  a  maximum  embedded  pure  network  set 
remains  NP-hard. 
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The  above  analysis  of  network  identification  algorithms 
has  only  addressed  the  worst  case  bound.   No  conclusions  can 
be  made  about  the  average  performance  of  an  optimal  algorithm 
In  other  words,  it  may  be  possible  to  develop  an  optimal 
algorithm  with  good  average  performance,  but  having  an 
exponential  worst  case  bound. 
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V.   UPPER  BOUNDS  ON  MAXIMUM  NETWORK  SET  SIZE 

The  problem  of  finding  a  maximum  size  pure  network  set 
of  rows  in  a  matrix,  regardless  of  scaling  restrictions ,  has 
been  shown  to  be  NP-hard.   This  also  applies  to  the  problem 
of  determining  the  size  of  a  maximum  set.   Upper  bounds  on 
the  maximum  set  size,  computed  in  polynomial  time,  can  be 
useful  in  evaluating  the  quality  of  network  sets  produced 
by  heuristic  algorithms. 

The  upper  bounds  on  maximum  pure  network  set  size  depend 
on  the  scaling  restrictions  employed.   It  should  be  noted 
that  the  bounds  computed  here  apply  to  the  maximum  set  size 
obtainable  from  the  set  of  eligible  rows.   They  do  not  apply, 
in  general,  to  the  maximum  set  size  obtainable  from  the  entire 
coefficient  matrix. 

Clearly,  the  maximum  set  size  can  be  no  greater  than  the 
number  of  rows  in  the  eligible  set,  but  this  bound  is  of 
little  practical  use.   Better  bounds  can  be  constructed  using 
information  already  available  in  the  heuristic  procedure. 
Two  upper  bounds  were  developed  as  a  result  of  this  research 
and  are  presented  here. 

The  first  bound  follows  directly  from  the  restriction  that 
each  column  of  the  matrix  (restricted  to  the  eligible  set)  is 
allowed  at  most  two  non-zero  entries.   If  k  represents  the 
maximum  number  of  non-zero  entries  in  any  column  of  A 
(considering  only  entries  in  eligible  rows) ,  then  it  is  clear 
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that  at  least  k-2  rows  must  be  deleted  from  the  eligible 
set  in  order  to  make  this  "worst  column"  feasible.   Since  the 
column  counts  are  readily  available  in  the  form  of  the  column 
penalties  (K.  and  K.),  the  upper  bound  on  the  network  set 
size  for  a  matrix  with  m  eligible  rows  is: 


u,   =  m  -  max  (K  .  +  K  . )  . 
1  j     3        1 

This  bound  is  evidently  sharp    in  that  matrices  can  be  con- 
structed for  which  it  is  achieved. 

The  second  bound  is  tighter,  but  requires  more  information 
about  the  problem  and  more  computation.   It  is  based  on  a 
matrix  penalty  computed  from  column  penalties,  rather  than 
row  penalties  as  in  the  NET  algorithm.   This  penalty  is 
defined  as: 

h   =      y      k+   +      y      kT    . 

■     1  3 

j:K+>0         j:K->0 
3  J  3 

Clearly,  as  long  as  H  >  0,  the  rows  remaining  in  the 
eligible  set  do  not  form  a  valid  network  set.   The  reflection 
of  a  row  in  the  eligible  set  may  decrease  H,  increase  H,  or 
leave  it  unchanged.   The  deletion  of  a  row  from  the  eligible 
set  may  decrease  H,  or  leave  it  unchanged.   The  actual  effect 
of  a  reflection  or  deletion  depends  on  the  rows  remaining  in 
the  eligible  set  and  their  state  (unreflected  or  reflected) 
at  the  time.   However,  it  is  possible  to  compute  for  each  row 
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the  maximum  possible  reduction  in  H  obtainable  by  reflection 
or  deletion  of  the  row,  regardless  of  the  other  rows  remaining 
in  the  eligible  set.   These  maximum  possible  reductions  are 
called  the  reflection   potential    and  deletion   potential    for 
the  row,  respectively. 

The  bound  will  be  computed  by  determining  the  minimum 
number  of  row  deletions  necessary  to  reduce  H  to  zero .   This 
cannot,  of  course,  be  computed  exactly;  however,  the  result 
will  be  conservative  in  that  it  will  guarantee  that  at  least 
that  number  of  rows  must  be  deleted. 

Consider  the  possible  states  of  a  column  j  of  A  in  which 
row  i  has  a  non-zero  entry  (i.e.,  a. .  ^  0).   The  six  possible 
cases  are  summarized  in  Table  1. 

Case  K£  Ku 

2 1_ 

1  0-1 

2  0  0 

3  0  >0 

4  >0  -1 

5  >0  0 

6  >0  >0 

K.   =  column  penalty  of  like  sign  to  a. . 
(K+  if  a±j  >  0;  K~  if  a±.  <  0) 

K1?  =   column  penalty  of  unlike  sign  to  a .  . 

Table  1 
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The  non-zero  entries  in  each  column  are  counted  only 
when  they  occur  in  the  initial  eligible  set.   The  penalties 
used  are  those  computed  before  any  row  reflections  or  dele- 
tions have  occurred. 

Consider  first  the  effect  on  column  j,  and  thus  H,  of 
reflecting  row  i.   In  cases  1,  5,  and  6,  reflection  of  row  i 
would  not  change  H.   In  case  4,  reflection  of  row  i  would 
decrease  H  by  1,  unless  another  row  with  a  non-zero  in  column 
j  was  previously  reflected.   In  cases  2  and  3,  reflection  of 
row  i  would  actually  increase  H  by  1,  unless  enough  other 
rows  with  non-zero  entries  in  column  j  were  reflected  or 
deleted  to  produce  a  -1  value  for  K..   Since  we  cannot  be 
sure  that  reflection  in  cases  2  and  3  would  actually  increase 
H,  we  must  consider  H  unchanged  by  reflection  in  these  cases. 
In  summary,  we  allow  H  to  be  decreased  only  by  reflection 
of  rows  with  non-zero  entries  in  columns  exhibiting  case  4. 
The  reflection  potential  for  row  i  is  computed  by  summing 
the  effects  for  each  column  in  which  row  i  has  a  non-zero 
element,  with  the  condition  that  only  one  row  reflection  is 
allowed  to  decrease  H  for  each  column  exhibiting  case  4. 

Row  deletions  provide  greater  opportunity  for  reducing  H. 
In  cases  1  and  2,    deletion  of  row  i  has  no  effect  on  H,  while 
in  cases  4,  5,  and  6,  deletion  of  row  i  directly  decreases  H 
by  1.   In  case  3,  deletion  of  row  i  does  not  directly  decrease 
H,  but  it  allows  reflection  of  another  row  with  a  non-zero 
in  column  j,  producing  a  net  decrease  of  1  in  the  value  of 
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H.   In  summary,  we  allow  H  to  be  decreased  by  deletion  of 
rows  with  non-zero  entries  in  columns  exhibiting  case  3,  4, 
5,  or  6.   The  deletion  potential  for  row  i  is  computed  by 
summing  the  effects  for  each  column  in  which  row  i  has  a 
non-zero  entry. 

To  obtain  the  second  bound,  the  reflection  and  deletion 
potentials  for  each  row  in  the  eligible  set  are  computed. 
Then  the  maximum  possible  reduction  of  H  by  row  reflections 
alone  is  computed  by  summing  the  individual  row  reflection 
potentials.   If  H  >  0  at  this  point,  then  rows  must  be 
deleted.   Rows  are  deleted  in  order  of  decreasing  deletion 
potential  until  H  <  0.   The  upper  bound  is  then  computed  as 


Up   =  m  -  number  of  rows  deleted, 


where  m  is  the  number  of  rows  in  the  initial  eligible  set 

This  bound  is  evidently  sharp,  since  examples  can  be 
constructed  which  satisfy  the  bound  exactly. 
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VI.   COMPUTATIONAL  RESULTS 

The  D-GUB  and  NET  algorithms  were  coded  in  FORTRAN  IV  and 
were  tested  on  16  large-scale ,  real-world  models.   The  sizes 
of  the  models  ranged  from  small  (90  rows,  177  columns)  to 
large  (3500  rows,  6500  columns).   Table  Al  of  Appendix  A 
provides  the  characteristics  of  each  model. 

The  results  obtained  for  the  D-GUB  algorithm  are  given 
in  Table  A2  of  Appendix  A.   The  row  eligibility  criterion 
used  for  the  results  reported  was  that  the  row  contain  only 
0,±1  entries,  or  be  able  to  be  scaled  to  0,±1  entries  by  row 
scaling  only.   The  number  of  eligible  rows  as  a  fraction  of 
the  total  row  count  ranged  from  9%  to  100%  (the  objective 
row(s)  not  being  eligible  in  any  case).   The  number  of  GUB 
rows  obtained  in  each  pass  is  indicated.   In  two  cases,  the 
entire  eligible  set  was  determined  to  be  a  GUB  set,  so  no 
second  pass  was  required.   The  times  given  are  in  CPU  seconds 
for  the  IBM  360/67  with  the  program  compiled  using  FORTRAN  H 
(Extended)  with  code  optimization  (OPT  =  2) . 

The  results  for  the  NET  algorithm  are  given  in  Table  A3 
of  Appendix  A.   Also  included  are  the  upper  bounds  on  the 
maximum  pure  network  set  size  computed  from  the  problem  data. 
The  times  given  for  determining  the  eligible  set  should  be 
nearly  the  same  as  those  for  the  D-GUB  algorithm  since  the 
same  eligibility  criterion  and  code  were  used  in  both  cases. 
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The  eligibility  of  rows  in  the  candidate  set  for  reinclusion 
in  Phase  2  was  determined,  but  Phase  2  was  not  included  due 
to  the  absence  of  eligible  rows  in  nearly  every  case.   The 
solution  time  does  not  include  the  time  required  to  determine 
eligibility  for  Phase  2.   The  times  given  are  again  in  CPU 
seconds  on  the  IBM  36  0/67.   The  NET  quality  value  is  the  number 
of  rows  in  the  network  set,  expressed  as  a  percentage  of  the 
best  known  upper  bound  on  the  pure  network  set  size.   As 
explained  earlier,  the  actual  maximum  network  set  size  is,  in 
general,  unknown  and  thus  the  actual  NET  quality  may  be  better 
than  this  conservative  estimate.   In  particular,  the  bounds 
are  almost  certainly  too  high  for  problems  with  a  large  number 
of  eligible  rows  (e.g.,  PAPER)  and  for  problems  with  dense, 
unstructured  coefficient  matrices  (e.g.,  TRUCK). 

The  overall  results  obtained  are  very  encouraging.   The 
algorithms  are  very  fast  (especially  when  compared  with 
computer  time  expended  in  any  attempt  to  solve  these  large 
problems)  and  they  consistently  produce  maximum  or  near 
maximum  pure  network  sets  (from  the  eligible  rows)  as 
evidenced  by  the  upper  bounds .   Work  is  underway  to  include 
one  or  both  algorithms  as  a  part  of  a  state-of-the-art 
optimization  system  which  will  provide  results  concerning  the 
value  of  the  factorizations  obtained  in  improving  the  solution 
times  for  the  models. 
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VII.   EXTENSIONS 

The  current  weakness  in  this  research  appears  to  lie 
primarily  in  the  area  of  scaling.   Many  problems  exhibiting 
intrinsic  network  structure  are  disguised  by  their  formulation 
and  resist  the  simplistic  attempts  used  here  to  rescale  them. 
In  particular,  the  COAL  model  is  known  to  be  a  complete 
network,  if  appropriately  scaled,  but  it  is  not  evident  how 
this  is  to  be  discovered  using  general,  problem- independent 
automatic  identification.   Methods  used  to  scale  an  entire 
matrix  to  0,±1  values  (see  [1],  [10])  can  be  attempted,  but 
failing  complete  conversion  the  next  step  is  not  evident. 
This  remains  a  potentially  fruitful  area  for  further  research. 

Another  potentially  profitable  area  is  the  extension  of 
the  methods  developed  for  the  network  identification  and 
associated  upper  bounds  to  other  types  of  special  structure. 
Schrage  [11]  has  reported  research  in  this  area  using  the 
conflict  method  of  Greenberg  and  Rarick  [7] .   It  is  felt 
that  the  superior  speed  of  the  gradient  method  exhibited  in 
the  GUB  factorization  [13]  and  seen  here  in  the  network 
factorization  would  carry  over  to  the  identification  of  other 
special  structures.   This  approach  is  currently  being  pursued. 

Finally,  the  possible  use  of  the  automatic  identification 
algorithms  in  discovering  special  structure  which  can  be 
used  in  the  automatic  decomposition  of  very  large  models 
looks  extremely  promising.   If  the  algorithms  can  be  used 
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to  identify  subproblems  connected  by  a  few  coupling  constraints, 
then  a  decomposition  of  the  problem  into  smaller  problems, 
each  with  a  special  structure  (e.g.,  transportation  or  assign- 
ment problems),  may  be  possible.   The  solution  of  these 
smaller  problems  using  very  efficient,  specialized  algorithms 
may  assist  in  the  solution  of  the  larger  problem.   One  class 
of  such  models  is  that  of  large  multi-commodity  production- 
transportation  problems. 
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VIII.   CONCLUSIONS 

The  stated  purpose  of  this  research  was  to  extend  the 
results  obtained  for  automatic  GUB  identification  in  large- 
scale  models  to  embedded  pure  network  identification.   As 
was  the  case  with  the  GUB  problem,  the  maximum  embedded  pure 
network  identification  was  shown  to  be  NP-hard  in  complexity. 
Therefore,  heuristic  algorithms  were  developed  and  used  to 
identify  large  pure  network  sets  in  polynomial  time. 

With  regard  to  accomplishing  the  intended  purpose,  the 
research  was  very  successful.   Using  restrictions  on  row 
eligibility  to  overcome  the  inherent  scaling  problems,  the 
algorithms  proved  extremely  efficient  in  identifying  large 
pure  network  sets.   Upper  bounds  were  developed  for  maximum 
pure  network  set  size  and  were  used  to  evaluate  the  effec- 
tiveness of  the  algorithms.   The  results  on  problem  complexity 
were  obtained  and,  as  expected,  justify  the  use  of  heuristic 
algorithms  to  approximate  optimal  solution  of  what  would 
otherwise  be  an  extremely  difficult  problem. 

The  unexpected  benefit  of  this  research  appears  to  be 
the  applicability  of  both  the  general  NET  identification  pro- 
cedure and  the  upper  bound  computations  to  many  other  types 
of  special  structures.   Research  is  continuing  in  this  area. 


44 


APPENDIX  A 

This  appendix  contains  three  tables  describing  the 
linear  and  mixed  integer  programming  models  used  to  test 
the  algorithms  described  in  this  paper  and  the  computational 
results  of  the  algorithms  applied  to  the  models.   The  content 
of  the  tables  is: 

TABLE  Al:   Sample  LP  (MIP)  Model  Characteristics 
TABLE  A2:   D-GUB  Algorithm  Computational  Results 
TABLE  A3:   NET  Algorithm  Computational  Results 
The  execution  times  provided  in  the  tables  are  expressed  in 
CPU  seconds  on  an  IBM  360/67  with  the  programs  compiled 
using  FORTRAN  IV  H  (Extended) .   Further  description  of  the 
data  in  the  tables  is  provided  in  Section  VI ,  Computational 
Results. 
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